CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification
نویسندگان
چکیده
This paper describes the system deployed by the CLaC-EDLK team to the SemEval 2016, Complex Word Identification task. The goal of the task is to identify if a given word in a given context is simple or complex. Our system relies on linguistic features and cognitive complexity. We used several supervised models, however the Random Forest model outperformed the others. Overall our best configuration achieved a G-score of 68.8% in the task, ranking our system 21 out of 45.
منابع مشابه
UWB at SemEval-2016 Task 11: Exploring Features for Complex Word Identification
In this paper, we present our system developed for the SemEval 2016 Task 11: Complex Word Identification. Our team achieved the 3rd place among 21 participants. Our systems ranked 4th and 13th among 42 submitted systems. We proposed multiple features suitable for complex word identification, evaluated them, and discussed their properties. According to the results of our experiments, our final s...
متن کاملUSAAR at SemEval-2016 Task 11: Complex Word Identification with Sense Entropy and Sentence Perplexity
This paper describes an information-theoretic approach to complex word identification using a classifier based on an entropy based measure based on word senses and sentence-level perplexity features. We describe the motivation behind these features based on information density and demonstrate that they perform modestly well in the complex word identification task in SemEval-2016. We also discus...
متن کاملECNU at SemEval-2016 Task 7: An Enhanced Supervised Learning Method for Lexicon Sentiment Intensity Ranking
This paper describes our system submissions to task 7 in SemEval 2016, i.e., Determining Sentiment Intensity. We participated the first two subtasks in English, which are to predict the sentiment intensity of a word or a phrase in English Twitter and General English domains. To address this task, we present a supervised learning-to-rank system to predict the relevant scores, i.e., the strength ...
متن کاملIIIT at SemEval-2016 Task 11: Complex Word Identification using Nearest Centroid Classification
This paper describes the system that was submitted to SemEval2016 Task 11: Complex Word Identification. It presents a preliminary investigation into exploring word difficulty for non-native English speakers. We developed two systems using Nearest Centroid Classification technique to distinguish complex words from simple words. Optimized over G-score, the presented solution obtained a G-score of...
متن کاملCENTAL at SemEval-2016 Task 12: a linguistically fed CRF model for medical and temporal information extraction
In this paper, we describe the system developed for our participation in the Clinical TempEval task of SemEval 2016 (task 12). Our team focused on the subtasks of span and attribute identification from raw text and proposed a system that integrates both statistical and linguistic approaches. Our system is based on Conditional Random Fields with high-precision linguistic features.
متن کامل